COS 511 : Theoretical Machine Learning

نویسنده

  • Haipeng Zheng
چکیده

Zt e −αty i ht(x i) ; where α t = 1 2 ln 1−t t , t = err Dt (h t), and Z t is normalization factor. end output: final/combined hypothesis: H(x) = sign(T t=1 α t h t (x)). Last time, we have shown that the training error goes down very quickly with respect to the number of rounds of boosting T 1. However, what is the performance of the generalization error of AdaBoost? How do we estimate the upper-bound of the generalization error? The combined hypothesis has a " weighted-vote " form. Define G = {all functions of form sign T t=1 α t h t (x) }. If H is consistent and |G | is finite, then with probability 1 − δ, err(H) ≤ O ln |G | + ln(1/δ) m. However, since the number of choices of α t is uncountable, |G | is infinite, hence the above bound is futile. Therefore, we need to use VC-dimension to bound the generalization error. Using the theorem in the previous two lectures and Homework 3, the bound is of the form err(H) ≤ err(H) + O ln Π G (2m) + ln(1/δ) m. (1) On the right hand side, the first term err(H) addresses the error caused by inconsistent H, and the second term O (ln Π G (2m) + ln(1/δ)) /m is the hard part to prove. Now the key question is to figure out the growth function Π G (2m) for the combined hypothesis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Theoretical Machine Learning Cos 511 Lecture #9

In this lecture we consider a fundamental property of learning theory: it is amenable to boosting. Roughly speaking, boosting refers to the process of taking a set of rough “rules of thumb” and combining them into a more accurate predictor. Consider for example the problem of Optical Character Recognition (OCR) in its simplest form: given a set of bitmap images depicting hand-written postal-cod...

متن کامل

COS 511 : Theoretical Machine Learning

In other words, if ≤ 1/8 and δ ≤ 1/8, then PAC learning is not possible with fewer than d/2 examples. The outline of the proof is: To prove that there exists a concept c ∈ C and a distribution D, we are going to construct a fixed distribution D, but we do not know the exact target concept c used. Instead, we will choose c at random. If we get an expected probability of error over c, then there ...

متن کامل

COS 511 : Theoretical Machine Learning

Suppose we are given examples x1, x2 . . . , xm drawn from a probability distribution D over some discrete space X. In the end, our goal is to estimate D by finding a model which fits the data, but is not too complex. As a first step, we need to be able to measure the quality of our model. This is where we introduce the notion of maximum likelihood. To motivate this notion suppose D is distribu...

متن کامل

COS 511 : Theoretical Machine Learning

as the price relative which is how much a stock goes up or down in a single day. St denotes the amount of wealth we have at the start of day t and we assume S1 = 1. We denote wt(i) to be the fraction of our wealth that we have in stock i at the beginning of day t which can be viewed as a probability distribution as ∀i, wt(i) ≥ 0 and ∑ iwt(i) = 1. We can then derive the total wealth in stock i a...

متن کامل

COS 511 : Theoretical Machine Learning

Last class, we discussed an analogue for Occam’s Razor for infinite hypothesis spaces that, in conjunction with VC-dimension, reduced the problem of finding a good PAClearning algorithm to the problem of computing the VC-dimension of a given hypothesis space. Recall that VC-dimesion is defined using the notion of a shattered set, i.e. a subset S of the domain such that ΠH(S) = 2 |S|. In this le...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008